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Abstract 


Despite broad applications of growth curve models, few studies have dealt with a practical issue -- 
nonnormality of data. Previous studies have used Student’s ¢ distributions to remedy the 
nonnormal problems. In this study, robust distributional growth curve models are proposed from a 
semiparametric Bayesian perspective, in which intraindividual measurement errors follow 
unknown random distributions with Dirichlet process mixture priors. Based on Monte Carlo 
simulations, we evaluate the performance of the robust semiparametric Bayesian method and 
compare it to the robust method using Student’s ¢ distributions as well as the traditional 
normal-based method. We conclude that the semiparametric Bayesian method is more robust 
against nonnormal data. An example about the development of mathematical abilities is provided 
to illustrate the application of robust growth curve modeling, using school children’s Peabody 
Individual Achievement Test mathematical test scores from the National Longitudinal Survey of 
Youth 1997 Cohort. 

Keywords: Semiparametric Bayesian, Robust method, Dirichlet process mixture, Growth 


curve modeling. 
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Robust Bayesian Approaches in Growth Curve Modeling: Using Student’s ¢ Distributions versus 


a Semiparametric Method 


Growth curve modeling is one of the most frequently used analytical techniques for 
longitudinal data analysis (e.g., McArdle and Nesselroade, 2014; Meredith and Tisak, 1990). In 
growth curve modeling, repeated measures of dependent variables are represented as a function of 
time and possible covariates, and the function mean is the mean growth. Individual variations 
around the mean growth curve are due to random effects and intraindividual measurement errors. 
Traditional growth curve analysis typically assumes that the random effects and intraindividual 
measurement errors are normally distributed. Although the normality assumption makes growth 
curve models easy to estimate, data in social and behavioral sciences are commonly collected 
using surveys or questionnaires and thus often are nonnormal (Cain et al., 2017; Micceri, 1989) 
because of nonnormal population distributions or data contamination. Ignoring the nonnormality 
of data may lead to inefficient or even biased parameter estimates, and statistical inferences based 
on common test statistics and fit indices could be misleading (e.g., Maronna, Martin, and Yohai, 
2006; Yuan and Bentler, 2001). In this article, we propose a semiparametric Bayesian method to 
handle the nonnormality issue in growth curve modeling and compare the proposed method to 
existing robust Bayesian approaches using Student’s rf distributions. 

Researchers have become more and more keenly aware of the large influence that 
nonnormality has upon model estimation (e.g., Hampel, Ronchetti, Rousseeuw, and Stahel, 1986; 
Huber, 1981; Yuan, Bentler, and Chan, 2004) and have developed strategies aiming to provide 
reliable parameter estimates and inferences when the normality assumption is violated. A 
straightforward and feasible strategy is to either transform the data so that they are close to being 
normally distributed, or directly delete potential outliers before data analysis. However, data 
transformation often makes the interpretation of the model estimation results complicated. 
Simply deleting outliers may reduce efficiency as the resulting inferences may fail to reflect 
uncertainty in the exclusion process (e.g., Lange, Little, and Taylor, 1989). Moreover, diagnosing 


multivariate outliers is a challenging task (e.g., Filzmoser, 2005; Pefia and Prieto, 2001). Tong 
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and Zhang (2017) proposed six methods to detect outlying observations in growth curve modeling 
and concluded that the greatest chance of success comes from the use of multiple methods, 
comparing their results and making a decision based on research purposes. Therefore, 
alternatively, many researchers (e.g., Savalei and Falk, 2014; Yuan and Zhang, 2012) have 
recommended the application of robust methods and statistics to protect data from being distorted 
by the presence of outliers or nonnormality. These methods either downweight the potential 
outliers as a transformation technique (e.g., Yuan and Bentler, 1998) or assume that the data come 
from certain nonnormal distributions such as a ¢ distribution or a mixture of normal distributions 


(e.g., Asparouhov and Muthén, 2016; Muthén and Shedden, 1999; Tong and Zhang, 2012). 


Recently, robust methods from Bayesian perspectives have drawn growing interest because 
Bayesian methods have many advantages. First, estimating models with complex structures often 
involves high dimensional integration and thus is computationally intensive. Sampling methods 
such as Markov Chain Monte Carlo (MCMC) under the Bayesian framework can handle this 
problem relatively easily. Second, Bayesian estimation can conveniently infer parameters that do 
not have symmetric distributions (e.g., variance parameters), whereas it is difficult or 
computationally intensive to capture the asymmetric nature for parameters using frequentist 
methods. Third, with Bayesian methods, prior information can be incorporated via informative 
priors to make parameter estimates more efficient. Furthermore, Bayesian methods naturally 
accommodate missing data without requiring new techniques for inference and missing data can 
be taken into account at the same time as parameter estimation. Because of these strengths, more 


and more Bayesian estimation methods are employed in robust analysis. 


From the Bayesian perspective, one approach of robust methods to account for 
nonnormality is to replace normal distributions by Student’s ¢ distributions in the model as the 
degrees of freedom of t distribution can control the robustness (Lange, Little, and Taylor, 1989). 
For example, Pinheiro, Liu, and Wu (2001) proposed a robust version of the linear mixed-effect 
model to remedy the distributional deviation from the normality assumption, in which normal 


distributions for the random effects and measurement errors were both replaced by multivariate t 
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distributions. A few studies have directly discussed this approach in growth curve analysis. Tong 
and Zhang (2012) and Zhang, Lai, Lu, and Tong (2013) suggested modeling heavy-tailed data and 
outliers in growth curve modeling using ¢ distributions and provided online software to conduct 
the robust analysis. The two articles demonstrated that the robust growth curve modeling based 
on ¢ distributions is easy to understand and implement, and thus potentially would greatly 
promote the adoption of robust growth curve analyses. However, although there are many 
advantages in using ¢ distributions for robust data analysis (e.g., Tong and Zhang, 2012), 
Student’s ¢ distribution has a parametric form and thus still has a restriction on the distribution of 
data. For example, using ¢ distributions may be sensitive to skewed data or mixture data, or even 
break down under some circumstances (e.g., Azzalini and Genton, 2008; Zhang, 2013). Note that 
researchers have proposed robust methods based on skew-normal distributions (e.g., Asparouhov 
and Muthén, 2016; Zhang, 2013) to overcome the problem of skewed data. However, again, 
skew-normal distributions have parametric forms and are limited to describing certain shape of 
distributions. Growth mixture models, first introduced by Muthén and Shedden (1999), provide 
another useful approach to reduce the influence of the nonnormality problem. These models 
assume that individuals can be grouped into a finite number of classes having distinct growth 
trajectories. Although growth mixture models are flexible, some difficult issues, including choice 
of the number of latent classes and selection of growth curve models within each class, have to be 
tackled. The typical strategy fixes the number of latent classes in advance at a small value (e.g., 
2-4), models the growth trajectories parametrically with a polynomial function, and assesses 
model fits using criteria such as AIC, BIC, and likelihood-based tests (Nylund, Asparoubov, and 
Muthén, 2007). Semiparametric Bayesian methods, sometimes referred to as nonparametric 
Bayesian methods (e.g., Gershman and Blei, 2012; Miiller and Mitra, 2004), provide a different 
approach to this problem. Rather than comparing models that vary in complexity, semiparametric 
Bayesian methods are to fit a single model that can adapt its complexity to the data and allow the 


complexity to grow as more data are observed. 


While parametric models can only capture a bounded amount of information from data, 
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semiparametric Bayesian models allow for a richer and larger class of models. Miiller and Mitra 
(2004) pointed out that restriction to a parametric family can mislead investigators into an 
inappropriate illusion of posterior certainty. On the contrary, semiparametric Bayesian methods 
are adaptive and have proven to be a valuable tool for discovering complicated patterns in data 
due to their great flexibility. There are two typical building blocks for semiparametric Bayesian 
models: Gaussian process and Dirichlet process, where Gaussian process is a distribution over 
functions that can be used for modeling functions and classification, and Dirichlet process is a 
distribution over probability measures that can be used for density estimation and clustering. For 
robust analysis against nonnormality, semiparametric Bayesian methods with Dirichlet process 
priors are desirable, by viewing latent variables or measurement errors as from unknown random 
distributions. Fueled by the MCMC ideas and the development of Bayesian software (e.g, Lunn, 
Jackson, Best, Thomas, and Spiegelhalter, 2013), many researchers have discussed the advantages 
and flexibility of using the semiparametric Bayesian methods (e.g., Fahrmeir and Raach, 2007; 
Ghosal et al., 1999; Hjort, 2003; Hjort et al., 2010; Miiller and Mitra, 2004; MacEachern, 1999) 
and have applied these methods to models with complex structures. For example, Bush and 
MacEachern (1996), Kleinman and Ibrahim (1998), and Brown and Ibrahim (2003) used Dirichlet 
process mixtures for random effects distributions. Ansari and Iyengar (2006) used Dirichlet 
components to define a semiparametric dynamic choice model. Burr and Doss (2005) used a 
conditional Dirichlet process for the random effects distribution within a meta-analysis 
application. Dunson (2006) used dynamic mixtures of Dirichlet process to allow a latent variable 
distribution to change nonparametrically across groups. For categorical data analysis, Dirichlet 
process mixtures of multinomial distributions have been studied and applied to missing data 
through multiple imputation techniques to capture complex dependencies especially in high 
dimensions (Si and Reiter, 2013; Si et al., 2015). Semiparametric Bayesian approach has also 
been developed for structural equation models to relax the assumption that the distribution of the 
latent variables is normal (e.g., Lee, Lu, and Song, 2008; Yang and Dunson, 2010). As far as we 


are aware of, measurement errors or residuals in these models are still normally distributed 
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although it is pointed out in Miiller and Mitra (2004) that the nonparametric model extension can 
go towards the direction of nonparametric residual distributions. 

Despite the popularity of growth curve modeling, the prevalence of nonnormal data, and the 
flexibility of semiparametric Bayesian methods, no study has been directly conducted on 
semiparametric Bayesian growth curve modeling. Therefore, the main contributions of this article 
is to (1) propose a semiparametric Bayesian approach for growth curve modeling to relax the 
normality assumption imposed in traditional normal-based analysis (especially on measurement 
errors); and (2) evaluate the performance of the semiparametric Bayesian method and compare it 
with the robust method using Student’s ¢ distributions which is a parametric analysis. The robust 
method using Student’s t distributions is selected for comparison because it is relatively more 
broadly used in practice as statistical software has been developed for it to facilitate the 
implementation. In the next section, we briefly review the traditional Bayesian growth curve 
modeling and the robust approach using Student’s ¢ distributions. After that, robust 
semiparametric Bayesian growth curve modeling is proposed. In addition, the comparison 
between semiparametric Bayesian models and finite growth mixture models is discussed. Then in 
the subsequent section, simulation studies are carried out to show the effectiveness of 
semiparametric Bayesian methods in comparison to the traditional growth curve modeling as well 
as the robust growth curve modeling using Student’s ¢ distributions. Finally, we illustrate the 
application of the semiparametric Bayesian methods through an example with the Peabody 
Individual Achievement Test math data from the National Longitudinal Survey of Youth 1997 
Cohort (Bureau of Labor Statistics, U.S. Department of Labor, 2005). We end the article by 


summarizing our findings with recommendations. 


Robust Bayesian Growth Curve Modeling 
Bayesian growth curve modeling: a brief review 


Growth curve models are used to analyze longitudinal data in which the same subjects are 


observed repeatedly over time on the same tests. Let y; = (yii,..., yrr)/ bea T x 1 random 
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vector and y;; be an observation for individual 2 at time j (2 = 1,...,N;j7 =1,...,7). Here N is 
the sample size and T’ is the total number of measurement occasions. A typical form of 


unconditional growth curve models can be expressed as 


yi Ab; + ej, 


b; aa B+ ui, 


where A is a 7’ x q factor loading matrix determining the growth trajectories, b; is aq x 1 vector 
of random effects, and e; is a vector of intraindividual measurement errors. The vector of random 
effects b; varies for each individual, and its mean, §, represents the fixed effects. The residual 
vector u; represents the random component of b;. Because £ is constant across individuals, b; 
and u; share the same type of distribution with different means. 

Traditional growth curve models typically assume that both e; and u; follow multivariate 
normal distributions such that e; ~ M Nr(0,®) and u; ~ MN,(0, ©), where MN denotes a 
multivariate normal distribution and the subscript denotes its dimension. The 7’ x 7’ matrix ® and 
the g x q matrix W represent the covariance matrices of e; and u;, respectively. The 
intraindividual measurement error structure is usually simplified to ® = 071 where o? is a scale 
parameter. By this simplification, we assume homogeneity of error variance across time and 
measurement errors are uncorrelated at different time points. Given the current specification of 
u;, b; ~ MN,(8, ®). 


Special forms of growth curve models can be derived from the preceding form. For 


example, if 
1 0 
1 1 ie B o oo 
A= b; = a = L p= L LS 
eae tar ee og }’ 
1 T-1 


the model represents a linear growth curve model with random intercept (initial level) L; and 
random slope (rate of change) S;. The average intercept and slope across all individuals are 6, 


and (gs, respectively. In W, ead and oe represent the variability (or interindividual differences) 
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around the mean intercept and the mean slope, respectively, and os represents the covariance 
between the latent intercept and slope. 

To estimate growth curve models, Bayesian methodology can be applied. Bayesian 
methods for complex data analysis have been made popular in the past few decades because of its 
advantages as described previously (e.g., Lee and Shi, 2000; Lee and Song, 2004; Lee and Xia, 
2008; Serang et al., 2015; Zhang et al., 2007b; Tong and Zhang, 2012). The basic idea of 
Bayesian methods is to obtain the posterior distributions of model parameters by combining the 
likelihood function and the priors. When priors are uninformative or weakly informative, the 
likelihood dominates and thus results from Bayesian estimation are similar to those from 
maximum likelihood estimation. For a typical unconditional growth curve model, we define the 
joint prior distribution of model parameters by p(8, ®, ©) and denote the likelihood function as 


L. The joint posterior distribution of model parameters is 


p(B, ®, Wly:) x | p(B, ®, W) x L db, 


where b = (bj,..., by)’. This integral is difficult to solve in practice. Instead, MCMC methods 
(e.g., Gibbs sampling; Robert and Casella, 2004) are often applied to obtain parameter estimates 
and statistical inferences. We first obtain conditional posterior distributions for the parameters, 
then by iteratively drawing samples from the conditional posterior distributions, we obtain 
empirical marginal distributions of the model parameters and make statistical inferences based on 
the empirical marginal distributions (Geman and Geman, 1984). Detailed algorithm can be found 


in Song and Lee (2012), Zhang et al. (2013), etc. 


Robust Bayesian growth curve modeling using Student’s ¢ distributions 


The traditional growth curve analysis discussed above is based upon the normality 
assumption of random effects and intraindividual measurement errors. However, practical data in 
social and behavioral sciences are rarely normal due to data contamination or nonnormal 
population distributions. Without taking the nonnormality problem into consideration, we may 


obtain inefficient or even incorrect parameter estimates in model estimation (e.g., Yuan and 
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Bentler, 2001; Yuan and Zhang, 2012). Studies to deal with the adverse effects of nonnormality 
on parameter estimates, standard errors, and test statistics have been carried out in growth curve 
analysis. In a growth curve model, the nonnormality may occur in the random effects, in the 
intraindividual measurement errors, or both (Pinheiro et al., 2001). Motivated by a real dataset 
from an orthodontic study, Pinheiro et al. proposed a robust hierarchical linear mixed-effects 
model in which the random effects and the intraindividual errors follow multivariate t 
distributions, with known or unknown degrees of freedom. 

By using multivariate ¢ distributions, extreme values in a dataset can be downweighted. 
Suppose k dimensional data x follows a multivariate ¢ distribution, with k x 1 location vector p, 
k x k shape matrix &, and degrees of freedom v, denoted by MT (1, H, v). The probability 
density function of x is 


_ Ty +k)/2] 7 Ty-1 eG 
p(x|u, &, v) — D(v/2)v'/2nk/2|y3|1/2 4 7 (x Ht) > (x — p) . 


The maximum likelihood estimates of model parameters 6 satisfy 


YY a A.D 6)>1 xX; —p 6 i) = 0 (Lange et al., 1989), where N is the sample size, A; is the 
i=1 a g 


matrix of partial derivatives of 4(@); with respect to 8, and w; = “*% is the weight assigned to 


a 


case 7 (7; is the dimension of 0 for case 7, and Of is the squared Mahalanobis distance 

52 = (x; — y;)’ Dy! (x; — p2;) for case i). Thus, potential outliers can be downweighted in the 
model estimation process because lower weights will be assigned to cases with large squared 
Mahalanobis distances, given fixed degrees of freedom v and dimensions 7;. 

From a Bayesian perspective, Tong and Zhang (2012) proposed four types of distributional 
growth curve models where the random effects u; and intraindividual measurement errors e; may 
follow either multivariate normal or ¢ distributions. They concluded that four types of 
distributional growth curve models imply very different patterns in growth trajectories, and thus 
given an empirical data set, it is very important to specify the correct type of growth curve 
models. Lu and Zhang (2014) expanded the study to further conduct a robust growth mixture 


analysis with nonnormal missing data. 


Although there are many advantages in using ¢ distributions for robust data analysis (e.g., 


ROBUST BAYESIAN APPROACHES IN GCM 11 


Lange, Little, and Taylor, 1989), a ¢ distribution is symmetric and thus still has a restriction on the 
distribution of the data. Constraining inference to a specific parametric form may limit the scope 
of inferences in many situations. For example, the shape of the assumed population distribution is 
restricted so that it may not approximate the data well enough. Also, the complexity of the model 
is fixed. If more data are collected later bringing in diversity and more characteristics, a different 
model may need to be adopted. This could impose technical challenges when researchers want to 
synthesize research findings from different studies (e.g., meta-analysis). Consequently, we 
propose to use semiparametric Bayesian methods/models in this article because there is no need 
to make arbitrary and unverified distributional assumptions for the latent variables or the 
measurement errors as in the parametric modeling. We expect that semiparametric Bayesian 
models perform equally well as parametric models when the parametric models are correctly 


specified, and outperform the parametric models when they are mis-specified. 


Semiparametric Bayesian growth curve modeling 


Semiparametric Bayesian methods, sometimes also called nonparametric Bayesian methods 
in the literature, are based on distributions over spaces of distributions and are flexible in 
modeling the nonnormality (e.g., Tong, 2014; Lee et al., 2008). Unlike typical classical 
nonparametric methods such as rank and permutation tests, semiparametric Bayesian methods 
can provide full probability models for the data-generating process and provide posterior 
distributions of model parameters. The semiparametric Bayesian approach to solving problems is 
rapidly gaining popularity among practitioners as theoretical properties become increasingly 
better understood and computational hurdles are being removed (e.g., Neal, 2000). Below we first 
introduce Dirichlet process priors and then develop semiparametric Bayesian methods in growth 
curve modeling. 

Dirichlet process (DP) priors. Within the semiparametric Bayesian scope, the traditional 


parametric assumption of a random vector € [e.g., € ~ N(pee, Be)] is replaced by 


Eo G; 
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where G is an unknown distribution function and its prior is a distribution over the distribution G. 
The Dirichlet process (DP, Ferguson, 1973, 1974) is a widely used prior for unknown 
distributions. DP generates random distributions and is characterized by two hyperparameters, a 
and Go. Go is a base distribution, which represents the central or “mean” distribution in the 
distribution space, while the precision parameter a governs how close realizations of G are to Gy. 
For any measurable partitions P,,..., P, of the sample space 1’, (G(P,),...,G(P,)) follows a 
Dirichlet distribution Dirichlet(aGo(P,),...,a@Go(P,)). For example, if V is the real space and 


P = (—oo, a] where x is a real number, then 
G(x) ~ Dirichlet(aGo(x), a(1 — Go(x))). 
Thus, 


E(G(«)) 


Go(x), 
Go(a)(1 = Go(x)) 


Var(G(a)) = Aad 


Ferguson (1973) pointed out that DP is a conjugate prior and has two desirable properties: (1) its 
support is sufficiently large, and (2) the posterior distribution is analytically manageable. He 
further explained that the posterior of G is DP(a, Go), where @ = a + N and 


: a N 
Go = Go4 Gn, 
oT at NT atN ON 


where G'y is the empirical distribution function of the data. Thus, the posterior point estimate of 
G, E(G|data) = Go, is a weighted average of two distributions: Gy and Gy. If a = 0, the 
posterior point estimate is Gy, which is nonparametric. When a approaches to infinity, the 
posterior point estimate approaches to G'o, which is parametric. In practice, the hyperparameter a 
is usually specified to follow Gamma(aj, az), which is neither 0 nor infinity. As suggested by 
Ishwaran (2000), a, = a2 = 2 is usually selected to encourage both small and large values of a. 
Thus, we consider the posterior point estimate of G' as semiparametric. 

Stick-breaking construction. There are different representations of DP and the most 


common one is based on the stick-breaking construction, developed by Sethuraman (1994). Let 
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G1; 2;-++;Uk;--. ~ Beta(1,a). Define 


k-1 


Pk = dk [a = Oy) 


j=l 


Then, 


G = Yo prdex, 


k=1 


where d¢- is the Dirac probability measure and €; ~ Go. Note that )72 px = 1 so it guarantees 
G to be a distribution. Sethuraman showed that draws from the stick-breaking are indeed DP 
distributed under general conditions. From this constructive way, we can tell that G(-) is a 
discrete distribution as 


i, withp=pi 


£5, with p= peo 


Ei, with p = pr 


The discrete nature of DP was also described by Blackwell and MacQueen (1973) through the 
Polya urn scheme. Thus, in a density estimation problem, where y; ~ G(-), i= 1,..., N, and y; 
is exchangeable (i.e., independently drawn from some unknown distribution), it would be 
inappropriate to assume G ~ DP if the distribution is known to be absolutely continuous. To 
define a continuous distribution, a convenient way is to use DP as the basis of a mixture model, 
for example, a mixture of N(j1;, 07) with mixing proportions defined by p;,. This model is known 


as the DP mixture model (Antoniak, 1974; Escobar and West, 1995): 
yl@; ~ F(A), 
G ~ DP(a, Go). 


Theoretically, there are infinite number of mixture components as k = 1,..., 00, given an 


arbitrarily flexible choice of distributional shapes. In practice, a finite number of mixture 
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components would be good enough, and this number is taken into account by DP. Smaller values 
of DP precision parameter a result in a smaller number of mixture components. If a@ approaches 


infinity, there would be N mixture components, one associated with each observation. 


Semiparametric Bayesian methods in growth curve modeling. Because 
semiparametric Bayesian methods are flexible in modeling the nonnormality, we propose to use 
them for robust growth curve modeling. The parametric assumptions of the intraindividual 
measurement errors e; (either normal or ¢ distributed) are replaced by an unknown random 
distribution G.. From a Bayesian perspective, a prior needs to be specified for G.. Because 


intraindividual measurement errors e; follow a continuous distribution, we use a DP mixture prior: 


e;|®; ~ MNr(0, ®j), 


G ~ DP(a,G)). 


That ise; ~ G., and G. ~ DPM is a mixture of multivariate normal distributions where the 
mixing measure has a Dirichlet process prior. Note that the mean of intraindividual measurement 
errors e;|®; is set to be 0 when the growth curve model is correctly specified. We can obtain the 
distribution G, by the truncated stick-breaking construction (e.g., Lunn, Jackson, Best, Thomas, 
and Spiegelhalter, 2013; Sethuraman, 1994). Assume that the data can be represented by a 
maximum of C' (1 < C < N) possible mixture components. For qi, q2,...,¢c ~ Beta(1, a), 


define p, = 4 Wed —q;),k =1,...,C. Then, the mixing proportion p;, is obtained by 
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to satisfy that ye Dr = 1. Thus, the unknown distribution G, can be constructed below, which 


is a mixture of multivariate normal distributions: 
MN(0,®), withp=p1 


MN(0,8), with p= ps 
G. = ’ 


MN(0,®), with p= po 
where ®) k =1,...,C are parameters of the multivariate normal distribution in the kth 
component. Inverse Wishart priors for ®“), p(@)) = IW (no, Wo), are used, where no and Wo 
are hyperparameters. Following Lunn et al. (2013, page 294), we can fix the shape parameter 79 
at a specific number and assign the uncertain scale matrix Wo an inverse Wishart prior. 

In general, G. is a symmetric distribution centered at 0. Samples drawn from G, can be 
multimodal. G, can approximate a normal distribution, a ¢ distribution, or other distributions. So 
G, may fit the data better than a ¢ distribution. The measurement error for the 7th individual, e;, 
comes from MM N(0, ®“)) with the probability p,. We should point out that the measurement 
errors for different individuals may be drawn from the same component. If e;,7 = 1,..., N are 
from K different distributions among / N(0, pl). k =1,...,C, K is the number of clusters 
for e;. Clearly, K < C, and within each cluster, e;s come from the same distribution. 

Recall that in the traditional growth curve model, 3, ®, and W are the model parameters. In 
the semiparametric Bayesian growth curve model, @ and W are still model parameters and can be 
estimated in the same way. However, instead of estimating ® (the covariance matrix of e;) as in 
the traditional model, we obtain e; and ’. The estimate of K indicates the heterogeneity of 
intraindividual measurement errors e;. With a larger value of K,, we are more confident to 
conclude that different individuals’ measurement errors are distributed differently. West (1992) 


pointed out that asymptotically follows a Poisson distribution 
K=1+4+42, «~ Poisson (a(y+logN)), 


where 7¥ is Euler’s constant. Table 1 gives different percentiles of the distribution of A’. With a 
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larger value of a and a larger sample size N, it is more likely to have more clusters. To obtain an 
estimate of ®, we let e;,),72 = 1,..., N be the observations of e; simulated from the posterior 
distribution in the sth Gibbs sampler iteration, and let ®(,) be the corresponding sample 
covariance matrix. An estimate of ® can be taken as the mean of ®,,), averaging over all the 


Gibbs sampler iterations after the burn-in period. 


In the present study for the semiparametric Bayesian growth curve modeling, only the 
distribution of intraindividual measurement errors is replaced by an unknown distribution with the 
DP mixture prior. A similar strategy can be taken for the random effects u;. In this article, we 
focus on the model described above because it is more comparable to the robust Bayesian method 
proposed in Zhang, Lai, Lu, and Tong (2013) and the comparison will be discussed in the next 
section. Information about semiparametric Bayesian growth curve models with random effects 
following a random distribution is available in Tong (2014). Tong (2014) concluded that it is 
important to specify the correct type of model for practical data analyses. Outlying observation 
diagnostic methods in growth curve modeling (Tong and Zhang, 2017) have been developed to 
distinguish outliers (extreme scores in measurement errors) and leverage observations (extreme 
scores in random effects). Such methods can be used first to figure out whether the random effects 
or the measurement errors follow normal distributions, so that we can then select the correct type 


of models and conduct data analysis. 


Semiparametric Bayesian growth curve modeling versus growth mixture modeling. 
For the semiparametric Bayesian growth curve models, the traditional normal distribution of 
intraindividual measurement errors is replaced by a random distribution with a DP mixture prior. 
Basically, the random distribution is a mixture of multivariate normal distributions with the 
mixing proportions generated following certain rules (e.g., stick-breaking construction). Similar 
to the growth mixture modeling, semiparametric Bayesian growth curve modeling also identify 
multiple unobserved sub-populations/clusters and describing longitudinal change within each 
unobserved sub-population/cluster. However, the semiparametric Bayesian growth curve 


modeling is different from the growth mixture modeling in several ways. 
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First, the traditional mixture modeling approach to clustering requires the number of 
clusters to be determined by comparing models with different number of clusters. Although 
substantive researchers typically use a combination of criteria (e.g., AIC and BIC) to guide the 
decision, there is not common acceptance of the best criteria despite various suggestions (Nylund 
et al., 2007). The Bayesian semiparametric approach sidesteps the problem of finding the correct 
number of clusters by assuming infinitely many mixture components. It provides a compelling 
alternative to the traditional finite mixture model paradigm and can automatically infer an 
adequate model size/complexity from the data, without explicitly doing Bayesian model 
comparisons. The risks of using too few classes are reduced while fully estimating uncertainty in 
posterior distributions. Second, in growth mixture modeling, the number of clusters is fixed, 
whereas in the semiparametric Bayesian growth curve modeling, the actual number of clusters 
used to model data is not fixed, and can be automatically inferred from data using the Bayesian 
posterior inference framework. So the model complexity is part of the posterior distribution. 
Semiparametric Bayesian modeling provides a way of getting very flexible models since inflexible 
models (e.g., a mixture of 5 Gaussians) may yield unreasonable inferences. Third, for growth 
mixture models, adding one additional cluster brings in more parameters to be estimated. Thus, it 
is not practically possible to have many clusters when we conduct growth mixture analyses. In 
contrast, it is often not a problem to obtain a large number of clusters if we use the semiparametric 
Bayesian method. Because of the hierarchical feature of Bayesian modeling, parameters for the 
Dirichlet process are of interest since they can determine the clusters no matter how many clusters 
there are. Fourth, semiparametric Bayesian growth curve modeling allows future data to exhibit 
previously unseen clusters. As Dunson (2013) pointed out, when semiparametric Bayesian 
methods are applied in practice, the prior and the penalty that comes in through integrating over 
the prior in deriving the marginal likelihood tends to lead to allocation of all the individuals in the 
sample to relatively few clusters. Individuals in the population may potentially come from more 
clusters. When more data become available, some data may show patterns that do not belong to 


previous data. The semiparametric Bayesian model automatically adjusts itself. However, for 
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growth mixture modeling, we may need to fit a different set of models when more data are 
included in the study because a larger sample size probably indicates more clusters. 

Note that the comparison between the semiparametric Bayesian growth curve modeling and 
finite growth mixture modeling is not conducted in our simulation study. As we explained, the 
semiparametric Bayesian method is essentially a mixture modeling with infinite number of latent 
classes. Therefore, as demonstrated in Dahlin et al. (2016), finite mixture modeling can be seen as 
an approximation of the infinite mixture modeling. Their numerical illustration showed that when 


the data record is finite, the two approaches reach the same conclusion. 


Comparison of the Two Robust Approaches through a Simulation Study 


The purpose of this simulation study is to compare the semiparametric Bayesian method 
with the existing robust method using Student’s ¢ distributions (Zhang et al., 2013) for growth 
curve modeling, in analyzing symmetric data, skewed data, mixture data, and data with outliers. 


The traditional normal-based method serves as a baseline for the comparison. 


Study design 


We focus on the linear growth curve model as discussed previously. In the model (see 
Figure 1), the fixed effects are given by 8 = ((, Bs) = (0.2, 0.3). These numbers are 
reasonable as they are the estimated fixed effects in the real data example. From Tong and Zhang 
(2012) and (Tong, 2014), it is known that the number of measurement occasions, the covariance 
between the latent intercept and slope, and the variance of the measurement errors do not 
significantly affect the relative performance of the robust methods. Therefore, we fix 7’ = 4, 

o2 = 0.5, and ops = 0 in this study. 

We study the effects of two factors - sample size and population distribution of data. Three 
different sample sizes are evaluated: N = 50, 200, and 500. The influence of population 
distribution on the three methods is evaluated by manipulating the distributions of intraindividual 
measurement errors of the growth curve model. In total, seven distributional conditions are 


considered, as shown in Table 2. As shown in Cain et al. (2017), these distributional conditions 
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have been observed in practice. For condition C1, data are generated from normal distributions as 
the baseline condition in comparison to nonnormal cases. For condition C2, data are still 
symmetrically distributed. However, they have longer-than-normal tails as the kurtosis is bigger. 
From conditions C3 to C4, the intraindividual measurement errors are generated from nonnormal 
distributions, with an increase of skewness and kurtosis. For condition C5, we generate mixture 
data with half of individuals’ measurement errors following N(0, 02) and the other half following 
N (0,502). Data in conditions C6 and C7 are generated with outliers in the intraindividual 
measurement errors. The number of outliers is 10 at each measurement occasion. We consider the 
effects of the geometry of outliers on the performance of the two models. Outliers can locate on 
one side or both sides of the population distribution, resulting in asymmetric or symmetric sample 
distributions, respectively. Outliers on one side of the population distribution are generated from 
the normal distribution e; ~ N(7o,, 02), and outliers on both sides of the population distribution 
are generated from either e; ~ N (70,02) or e; ~ N(—To-, 02) with equal probability. The 
outliers are generated following the definition of outliers in growth curve modeling (Tong and 
Zhang, 2017; Yuan and Zhong, 2008). 

For each simulation condition, a total of 500 data sets are generated. For each data set, the 
traditional normal-based method, robust method using ¢ distributions, and semiparametric 
Bayesian method! are applied separately using free software R and OpenBUGS (Lunn et al., 
2013). The programming code is available on this webpage: 
https://www.dropbox.com/sh/njhhnm4tdi4ahp9/AADXXBbB79qvp0x37 Y ZSNeK7a?dl=0. We 


obtain the parameter estimate, bias, relative bias”, standard error, mean squared error (MSE), and 


' Notice that for semiparametric Bayesian method, the maximum of the potential number of mixture components C 
is set at 20 in the simulation. This is reasonable because the estimated number of clusters is always below 20 based 
Tong (2014). So setting C at 20 or a larger value such as 50 does not affect the model estimation. But a very small 
value of C’ (e.g., 5) may reduce the accuracy of the estimation. In addition, the hyperparameter a: is specified to 


follow Gamma(2, 2), as suggested by Ishwaran (2000). 


6 x 100% 0=0 : 
> relative bias = ¢ » , where @ is the true parameter value and @ is the estimate of 0. 


6-6 
x 100% 040 
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coverage probability (CP) of the 95% highest posterior density (HPD) credible intervals? for each 
parameter. In addition, Geweke tests (Geweke, 1992) are used to assess the convergence of 


Markov chains for all simulation replications. 


Note that for the traditional growth curve modeling, priors of the model parameters are 
selected following Zhang et al. (2013). Fixed effects 6 has a noninformative diffuse prior 
N(0, 10°). The covariance matrix of the latent intercept and slope © has an inverse-Wishart prior 
with an indentity scale matrix and degrees of freedom being 2. The variance of the measurement 
errors a? also has a diffuse prior inverse — gamma(0.001, 0.001). For the robust growth curve 
modeling using ¢ distributions, the prior for the degrees of freedom of the ¢ distribution is a 
uniform distribution U(0, 100). For the semiparametric Bayesian growth curve modeling, DP 
mixture prior is used for the intraindividual measurement error e; with a ~ Gamma(2, 2) and 
@") ~ IW(no, Wo) where no is fixed at 2 and the diagonal elements of Wo have inverse-gamma 


distributions. Detailed prior information is also provided in the shared OpenBUGS code. 


Main results. Table 3 presents the detailed parameter estimation results from the two 
robust estimation methods as well as the baseline normal-based method for conditions Cl when 
the sample size is 200. When sample size is 50 or 500, similar patterns are observed. For 
conditions C2-C7, data are not normally distributed. Both robust methods outperform the 
normal-based method, which is consistent with the existing literature (e.g., Tong and Zhang, 
2012). In this section, we mainly compare the two robust methods based on the average of MSE 
and CP across parameters. Geweke tests suggest that all cases are converged. Detailed estimation 
results for all parameters in all conditions are available on the webpage: 
https://www.dropbox.com/sh/njhhnm4tdi4ahp9/AADXXBbB79qvp0x37 Y ZSNeK7a?dl=0. We 


summarize our main findings below. 


3 Posterior credible interval, also called credible interval or Bayesian confidence interval, is analogical to the 
frequentist confidence interval. The 95% HPD credible interval [1, u] satisfies: 1. Prob(l < 0 < u|data) = 0.95; 2. 
for 0; € [l, u] and 05 ¢ [1, u], Prob(0,|data) > Prob(62|data). In general, HPD intervals have the smallest volume 


in the parameter space of 0, and numerical methods have to be used to find HPD intervals. 
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For condition Cl (e1,...,er ~ N(0,07)), Table 3 shows that the two robust methods and 
the traditional normal-based method perform equally well because the bias, empirical standard 
errors, MSEs and CPs for the estimated parameters are similar. For the robust method using t¢ 
distributions, the estimated degrees of freedom is 55.661, indicating that the intraindividual 
measurement errors e; are most likely to be normally distributed. For the semiparametric 
Bayesian method, the estimated number of clusters of e; is 5.150, meaning that there are 5 
different clusters for the measurement errors. The estimated K, is related to the DP precision 
parameter a. In this study, the estimated value of a is 0.928 given the prior a ~ Gamma(2, 2). 
Notice that the estimated number of clusters of e; is larger than 1 for normally distributed data. 
We may suspect whether there are overfitting problems. Since semiparametric Bayesian methods 
are data driven and a governs the model complexity to some degree, it is not prone to overfitting 
in theory. Since parameter estimations are very similar across the three methods applied to 
analyze the data, we do not further investigate overfitting problems in this article. More will be 
discussed in the discussion section. 

For conditions C2-C7, both robust methods outperform traditional normal-based method in 
terms of bias and standard error. So we only summarize results from robust methods into Table 
4*, Results from condition C1 are also included in this table as a benchmark. We calculate MSE 
and CP for each model parameter and average them over certain model parameters for each 
simulation condition to compare the two types of robust growth curve models. For conditions 
C1-C5, MSEs and CPs are averaged over all the six model parameters (7, 3s, o. ee; ors, and 
o?. For conditions C6 and C7, MSEs and CPs are only averaged over 3;, 3s, 07, 02, and ozs, 
since the population parameter value for a? is unkown due to the existence of outliers for the 
intraindividual measurement errors. In Table 4, on the rows are the different types of generated 
data and on the columns are the two types of robust methods used to analyze the generated data. 

(1) As we discussed, for condition C1, the two robust methods perform about the same. 


When sample size is large, the semiparametric Bayesian method leads to CPs closer to the 


4 Note that all the detailed parameter estimation results are provided on our webpage in the same structure as Table 3. 
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nominal level 0.95. 


(2) From conditions C2 to C4, the performance of the semiparametric Bayesian method 
becomes better than the performance of the robust method based on ¢ distributions, and this 
pattern gets more and more prominent as data deviate more from normal distributions. For 
condition C2 (e;,...,er ~ tis) (0, o2)), the two methods perform similarly in estimating fixed 
effects and random effects related parameters 3, 35, 07, 02, and os. For the estimated o?, the 
robust method based on ¢ distributions leads to a larger bias, a larger standard error, and thus a 
larger MSE and a worse CP. This is because the intraindividual measurement errors are generated 
from rescaled ¢ distributions. Although the distribution of e; is still symmetric, it is not distributed 
as the regular Student’s ¢ distribution. The estimated degrees of freedom is far from precise as the 
standard error of it is 9.58. For the semiparametric Bayesian method, the estimated number of 
clusters of e; is 7.60, which is bigger than that in condition Cl, meaning that there are more 
clusters for the measurement errors when they are not normally distributed. For conditions C3 
and C4, the distribution of e; is not symmetric anymore. The semiparametric Bayesian method 
performs consistently better than the t-distribution-based method for all the parameter estimates, 
especially the variance of measurement errors 02. Even for the estimation of the fixed effects 
parameter 3, the difference between the performance of the two robust methods is obvious. For 
example, by switching from the robust t-distribution-based method to the semiparametric 
Bayesian method in condition C3, the bias of B zt is reduced by (0.081-0.021)/0.081=74%, and the 
CP increases from 0.80 to 0.94. Therefore, the semiparametric Bayesian method is more robust to 
the skewed data than the robust method using ¢ distributions. For highly skewed data with heavy 
tails (e.g., condition C4), the bias of some parameter estimates (e.g., 7) from the 
t-distribution-based method can be as high as 10 times bigger than that from the semiparametric 
Bayesian method. Given such results, the robust Bayesian method using ¢ distributions should not 


be used for highly skewed data when comparing to the semiparametric Bayesian method. 


Note that the df, can control the robustness for the ¢-distribution-based method (e.g., Zhang 


et al., 2013) and the AK, can control the robustness for the semiparametric Bayesian method. From 
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conditions C1-C4, data distributions are getting more severely different from normal distributions 
with an increase of skewness and kurtosis. As a result, the estimated df, decreases from 55.661 to 
1.667 for the t-distribution-based method, and the estimated K, increases from 5.150 to 9.812 for 
the semiparametric Bayesian method. Although both robust models have their accommodations 
to the distributional deviance from normal distributions, the semiparametric Bayesian method 


seems to be more robust to data generated from nonnormal populations. 


(3) For condition C5 with mixture data, the semiparametric Bayesian method also 
outperforms the ¢-distribution-based robust method. Particularly, the estimated parameters from 
the semiparametric Bayesian method are more accurate and efficient. The CP is improved 
substantially when the semiparametric Bayesian method is used. For the semiparametric 
Bayesian growth curve model, the estimated K, is 7.5557, bigger than that in condition Cl, 


indicating that more clusters are needed for mixture data analysis. 


(4) For conditions C6 and C7 where data contain outliers, the two robust methods provide 
similar MSEs and CPs regardless of the geometry of the outliers. By comparing conditions C3 
and C4 with condition C6, we notice that although data under those conditions are all skewed, the 
comparisons between the performance of the two robust methods are different. This is consistent 
with the existing literature (Zhang et al., 2013). Although Student’s ¢ distribution is symmetric 
and sensitive to skewed data, practically, we find that if the skewness in the data is caused by 
outliers, the t-distribution-based robust method can still perform very well. However, if the 
skewness is because of skewed distributions such as the Gamma distribution and the lognormal 
distribution, the t-distribution-based robust method may break down. The t-distribution-based 
method can downweight potential outliers, but cannot be used to approximate systematical 
skewed distributions. In contrast, the semiparametric Bayesian model is more robust to skewed 


data and performs stably well. 
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General conclusions 


In sum, we draw the following conclusions. First, the two robust methods outperform the 
traditional normal-based growth curve modeling when data are nonnormal. When data are 
normally distributed, the performance of the robust methods is as good as that of the traditional 
normal-based method. Second, the semiparametric Bayesian method can analyze skewed data 
and mixture data more accurately and precisely than the robust method using ¢ distributions in 
growth curve modeling. Third, in modeling data with outliers, the two robust approaches perform 
equally well. Fourth, the increase of the sample size can often improve the performance of all 


three estimation methods, but it does not affect their relative performance. 


An Illustrative Example 


To demonstrate the application of the two robust approaches, we investigate a subset of data 
from the National Longitudinal Survey of Youth 1997 (NLSY97) Cohort (Bureau of Labor 
Statistics, U.S. Department of Labor, 2005)°. In the study, 512 students’ Peabody Individual 
Achievement Test (PIAT) mathematics scores were collected yearly from the 7th grade to the 10th 
grade. The trajectory plot for all the individuals (Figure 2) suggests a linear growth pattern for the 
development of math abilities. The descriptive statistics of the data (Table 5) reveal that the 
skewness and kurtosis of the data at grades 9 and 10 are significantly different from those of 
normal distributions. The PIAT math scores at each year are skewed to the left. Thus, it is 
reasonable to consider the data as nonnormal. As a consequence, we will use this dataset to 


illustrate the application of the robust methods. 


A linear growth curve model is fitted to the data and two robust methods are used for model 


> In addition to the illustrative example in this article, we also provide another example to help substantive researchers 
understand the application of the proposed robust method in growth curve modeling, using data from the Virginia 
Cognitive Aging Project. The detailed description of this example as well as the programming code are available on 


our webpage: https://www.dropbox.com/sh/njhhnm4tdi4ahp9/AADXX BbB79qvp0x37 YZSNeK7a?dl=0. 
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estimation, including the ¢-distribution-based robust method and the semiparametric Bayesian 
method. The traditional normal-based method is also applied to serve as a baseline. The 
parameter estimates are provided in Table 6. Differences among the three methods are shown. 
Given the conclusions from the simulation study, the results from semiparametric Bayesian 
methods are more reliable. The average initial mathematical ability at grade 7 is about 6.168 with 
an average growth rate of 0.314 from grade 7 to grade 10. The credible intervals for estimated o? 
and 0% do not cover zero, indicating that there are interindividual differences in the initial ability 
and the rate of change, respectively. Contrary to the traditional growth curve modeling with 
normal assumptions which fails to detect a correlation between latent intercept and slope, the 
semiparametric Bayesian method (as well as the ¢-distribution-based robust method) reports a 
negative association between initial math abilities and math ability growth rates since the credible 
interval of the estimate o,,5 does not cover zero. Specifically, estimation results suggest that 
children initially with lower math abilities exhibit higher growth rates in their math abilities from 
grade 7 to 10. The contradictory results between the traditional normal-based method and the 
robust methods are likely related to the width of the estimated credible intervals. In the presence 
of nonnormality, robust methods typically provide more precise parameter estimates, and thus the 
credible intervals obtained from robust methods are likely to be narrower than the corresponding 
intervals obtained from the traditional normal-based method, which is exactly the case in this 
example. Given that the correlation between the latent intercept and slope parameters across 
sessions are interested in many studies (e.g., Zhang, Davis, Salthouse, and Tucker-Drob, 2007a), 
the traditional normal-based growth curve modeling is not recommended to use when the 
nonnormality is suspected. The example here documents evidence favoring the semiparametric 


Bayesian approach as it detects the existence of an effect more often than the traditional method. 


Discussion 


In this article, we proposed a semiparametric Bayesian approach for growth curve analysis 


with nonnormal data. The normal distributions of the intraindividual measurement errors of the 
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traditional growth curve model were replaced by random distributions with DP mixture priors. In 
general, being Bayesian allows the model to avoid having to choose a single value for each 
parameter, and being semiparametric enables the model to increase the dimensionality of the 
parameters as more data become available. The performance of the semiparametric Bayesian 
method was systematically evaluated through a simulation study. We further compared the 
semiparametric Bayesian method to the robust method using Student’s ¢ distributions (e.g., Tong 
and Zhang, 2012) as well as the traditional normal-based method to illustrate the intrinsic 


characteristics of the three approaches. 


From Tong and Zhang (2012) and Tong (2014), it was known that the number of 
measurement occasions, the potential number of clusters, the covariance between the latent 
intercept and slope, and the variance of the measurement errors did not affect the performance of 
the robust method using Student’s ¢ distributions and the semiparametric Bayesian method. For 
simplification, we fixed these four factors and only considered the effect of sample size and 
distribution of the population in this study. Overall, the traditional normal-based method should 
not be used when data are nonnormal. In general, the semiparametric Bayesian method 
outperformed or performed as well as, the robust method using Student’s ¢ distributions. The 
semiparametric Bayesian approach is more robust to the nonnormality, especially when the 
nonnormality of the data is caused by a nonnormal population. For example, when the 
measurement errors followed Gamma or lognormal distributions, a random distribution with DP 
mixture priors captured data better than a ¢ distribution because the ¢ distribution has a parametric 
form and the shape of it is far away from the shape of a Gamma or a lognormal distribution. Thus, 
we prefer to use robust semiparametric Bayesian method to analyze such data. If the 
nonnormality is caused by data contamination such as the existence of outliers, the robust method 
using ¢ distributions can perform as well as the semiparametric Bayesian method. In practice, it is 
difficult to distinguish samples with contamination from samples with a nonnormal population 
distribution. In general, we recommend using semiparametric Bayesian methods when 


nonnormality is suspected. 
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Note that we only compared the proposed semiparametric Bayesian method to the robust 
method using Student’s ¢ distributions. In the Bayesian growth curve modeling literature, we are 
aware of one paper (Zhang, 2013) about modeling measurement errors using different type of 
distributions including skew-? distributions. Zhang (2013) concluded that one can avoid the loss 
in the efficiency of standard error estimates when the distribution of the errors is correctly 
specified. Namely, if data distribution is symmetric and have heavy tails, ¢ distributions should be 
used. If data distribution is skewed, skew-normal distributions should be used. We can compare 
the proposed semiparametric Bayesian method to robust methods based on those distributions 
when skewness of the data is suspected. Overall, our study proposed a semiparametric Bayesian 
approach and we want to show that it outperforms parametric modeling. The robust growth curve 
modeling using t distributions is just one representative of parametric robust growth curve 
analyses. It can be changed to skew-normal distributions which still have parametric forms. 
Correspondingly, the infinite mixture of normal distributions in the semiparametric Bayesian 
approach can be changed to the infinite mixture of skew-normal distributions if necessary. 
Although we did not compare the skew-normal-based robust method to the DP mixture modeling 
based on a mixture of skew-normal distributions in the simulation study with asymmetric data, we 
believe the current comparison is still important as to compare parametric and semiparametric 
Bayesian analyses. In addition, the robust method using Student’s ¢ distributions is adopted in our 
study because it is relatively more broadly used in practice as software has been developed to 
facilitate the implementation. Therefore, the conclusion from our manuscript has its practical 


merits. 


Each model expresses a data generative process with hidden parameters. Posterior inference 
is akin to reversing the data generative process to find the hidden structure with which the 
observed data can most likely be generated. What distinguishes semiparametric Bayesian models 
from other Bayesian models is that the hidden structure is assumed to grow with the data. Its 
complexity (e.g., the number of mixture components) is part of the posterior distribution. As 


illustrated in the simulation study, it is determined as part of the data analysis rather than specified 
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a priori. Namely, the semiparametric Bayesian method has the advantage of building a Bayesian 
model on an infinite-dimensional® parameter space that can be evaluated on a finite sample in a 
manner that uses only a finite subset of the available parameters to explain the sample. 

The employment of semiparametric Bayesian methods 1s a field still in its infancy. New 
Dirichlet process variants and generalizations more suitable to specific applications are being 
found every year. In our current semiparametric Bayesian growth curve models, only the 
distribution of intraindividual measurement errors was replaced by an unknown distribution with 
the DP mixture prior. The similar strategy can be taken for random effects. Based on Tong 
(2014), if both measurement errors and random effects are allowed to follow unknown random 
distributions with DP mixture priors, the model convergence rate would be relatively lower. 
Further study (e.g., using longer Markov chains, picking more informative priors, etc.) needs to 
be conducted to resolve this issue. We also want to point out that although this study focuses on 
simple unconditional linear growth curve models, the same methods work for more complicated 
growth curve models such as linear growth curve models with covariates and nonlinear growth 
curve models as well. 

The DP precision parameter a governs the expected number of clusters. A smaller value of 
a results in a smaller number of clusters. As suggested by Ishwaran (2000), the prior 
a ~ Gamma(2, 2) is used in our study to encourage both small and large values. Theoretically, a 
noninformative prior for a may yield a better estimation of the number of clusters if no prior 
information is available, but usually causes non-convergence issues or a much longer time for the 
Markov chain to converge in practice (e.g., Jara, Hanson, Quintana, Miiller, and Rosner, 2011; 
Ohlssen, Sharples, and Spiegelhalter, 2007; West, 1992). 

We also want to point out that although normal distributions were replaced by random 
distributions with DP mixture priors in the robust semiparametric Bayesian method, the 
distribution used in forming the DP mixture is still normal in this study. However, the base 


distribution can be nonnormal and chosen in different ways depending on the purpose of analysis. 


© Infinite-dimensional can be interpreted as of finite but unbounded dimension. 
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If density estimation covers the entire real space, one may use a location-scale kernel such as the 
normal density or the Student’s ¢ density. If the density estimation only covers half of the real 
space, gamma, log-normal and Weibull mixtures seem to be more appropriate. On the unit 
interval, mixtures of beta densities can be considered. Special shapes can be produced by special 


types of mixtures and need to be considered case by case. 


For model comparisons, we only compared the parameter estimation. How well the models 
fit the data is not evaluated. Deviance Information Criterion (DIC) is widely used to evaluate the 
model fit in Bayesian analysis. Despite the popularity of DIC, it has received much criticism since 
it was proposed (Spiegelhalter, Best, Carlin, and Linde, 2002). Celeux, Forbers, Robert, and 
Titterington (2006) argued that the DIC introduced by Spiegelhalter et al. for model assessment 
and model comparison was directly inspired by linear and generalized linear models, but was open 
to different possible variations in the setting of models involving random effects, as in our robust 
growth curve models. Many ways to compute DICs have been proposed in Celeux et al. (2006). 
However, the calculation of DIC in semiparametric Bayesian analysis has not been studied, but 


should be considered in the future since DIC is an important index for model comparison. 


It is known inflexible models often yield unreasonable inference. For example, complex 
models may cause overfitting problems. Since semiparametric Bayesian methods provide a way 
of getting very flexible model and are to fit a single model that can adapt its complexity to the 
data, it should be an effective guard against overfitting in theory. As stated in the literature (e.g., 
Niekum, 2015), DP mixture is used to find an appropriate number of mixture components and 
their associated parameters that best explain the data without overfitting in a fully Bayesian 
manner. Dunson (2013) pointed out that the concerns about overfitting in semiparametric 
Bayesian are largely unfounded. In practice, even though there are infinitely many components, 
only a few of these are allocated and the model behaves like a finite mixture model with sieve 
behavior in terms of using more components as the sample size increases. “Contrary to the 
concern about overfitting, the tendency is instead to place a high posterior weight on very few 


components, potentially underfitting in small sample sizes”. As far as we are aware of, no study 
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has systematically evaluated the overfitting or underfitting problems. More research is needed to 


formally address these issues. 
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Table 1 
Different percentiles (5%, 50%, 95%) of the distribution of the number of clusters K, given 


different values of precision parameter a and sample size N 


a=0.1 a=1 a=2 
5% 50% 95% 5% 50% 95% 5% 50% 95% 
N = 200 1 1 3 3 7 11 7 13 19 
N = 500 1 1 3 4 8 12 9 14 21 
N = 1000 1 2 3 4 8 13 10 16 23 
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Table 2 


Seven distributional conditions used for the Monte Carlo simulation 


Condition Distribution of e; — er Kurtosis of e; — er Skewness of e; — er 
Cl N(0, 02) 0 0 
C2 t(5) (0, G.) 6 0 
C3 T,1) (0, 02) 6 2 
C4 LNo,1) (0,02) 110.936 6.185 
C5 Mixture of (0,02) and N(0, 502) - - 
C6 N(0,02) with 10 outliers on one side - - 
C7 N (0,02) with 10 outliers on both sides - - 


Note. tiapy(H, 07), (ais) (ut, 07), and LN (uy ,02,) (bs o”) represent the rescaled ¢ distribution, the rescaled Gamma distribution, and the 


rescaled lognormal distribution, respectively, with mean yu and variance o?. 
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Table 4 


Mean squared errors and coverage probabilities for different data conditions 


N =50 N = 200 N = 500 
Robust t Semiparametric Robust ¢ Semiparametric Robust ¢ Semiparametric 
Cl MSE 0.019 0.023 0.005 0.005 0.002 0.002 
CP 0.927 0.923 0.934 0.942 0.931 0.944 
C2 MSE 0.021 0.021 0.005 0.005 0.002 0.002 
CP 0.907 0.924 0.899 0.908 0.900 0.913 
C3 MSE 0.020 0.019 0.006 0.005 0.003 0.002 
CP 0.892 0.912 0.861 0.902 0.825 0.905 
C4 MSE 0.037 0.039 0.010 0.008 0.006 0.004 
CP 0.819 0.845 0.785 0.856 0.699 0.828 
C5 MSE 0.053 0.043 0.015 0.012 0.007 0.005 
CP 0.883 0.904 0.879 0.903 0.884 0.907 
C6 MSE 0.180 0.158 0.006 0.011 0.003 0.003 
CP 0.859 0.828 0.961 0.907 0.950 0.936 
C7 MSE 0.037 0.043 0.006 0.007 0.003 0.002 
CP 0.971 0.977 0.962 0.963 0.957 0.967 


Table 5 


Descriptive statistics of the PIAT math data from NLSY97 


Grade Mean s.d.  Skewness_ Kurtosis 
Z 6.071 1.312 ~ -0.110 3.392 
8 6.590 1.392  -0.168 3.336 
9 6.796 1.419 -0.564* 4.814% 
10 7.044 1.325 -0.344* 3.708* 
Note. s.d. = standard deviation. The “*” sign indicates that the corresponding statistic is signifi- 


cantly different from that of a normal distribution. The significance of skewness is tested through 


the D’ Agostino test, and the significance of kurtosis is tested through the Anscombe-Glynn test. 
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Table 6 

Parameter estimates from the traditional method as well as the two robust approaches 

Normal-based Robust t Semiparametric 

Parameters Est. CLE Clu Est. CLL = CLU Est. CLL CLU 
Br 6.157 6.047 6.269 6.153 6.043 6.263 6.168 6.058 6.276 
Bs 0.312 0.275 0.349 0.319 0.283 0.352 0.314 0.280 0.349 
Ge 1.125 0.937 1.337 1.206 1.019 1.414 1.153 0.974 1.357 
o2 0.035 0.024 0.049 0.042 0.029 0.057 0.040 0.028 0.055 
OLS -0.034 -0.081 0.009 -0.050 -0.096  -0.008 -0.049 -0.093  -0.009 
a 0.748 0.694 0.806 0.729 0.698 0.762 0.737 0.698 0.781 
df. - - - 4.143, 3.222 5.277 - - - 
# of clusters - - - - - - 11.060 5.000 18.000 


Note. Est = estimate; CI.L = lower limit of the 95% credible interval; CI.U = upper limit of the 


95% credible interval. 
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Figure 1. Path diagram of a linear growth curve model. The numbers in the path diagram are 


population parameter values used in the simulation. 
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Figure 2. A collection of individual trajectories for the PIAT math data from NLSY97. A total of 


512 students are measured at 4 occasions. The red line represents the mean growth trajectory. 


